Hindi Word Sense Disambiguation
نویسندگان
چکیده
Department of Computer Science and Engineering Indian Institute of Technology Bombay, Mumbai India {manish, mahesh, pb,pandey,yupu}@cse.iitb.ac.in Abstract Word Sense Disambiguation (WSD) is defined as the task of finding the correct sense of a word in a specific context. This is crucial for applications like Machine Translation and Information Extraction. While the work on automatic WSD for English is voluminous, to our knowledge, this is the first attempt for an Indian language at automatic WSD. We make use of the Wordnet for Hindi developed at IIT Bombay, which is a highly important lexical knowledge base for Hindi. The main idea is to compare the context of the word in a sentence with the contexts constructed from the Wordnet and chooses the winner. The output of the system is a particular synset number designating the sense of the word. The mentioned Wordnet contexts are built from the semantic relations and glosses, using the Application Programming Interface created around the lexical data. The evaluation has been done on the Hindi corpora provided by the Central Institute of Indian Languages and the results are encouraging. Currently the system disambiguates nouns. Work is on for other parts of speech too.
منابع مشابه
Mining Association Rules Based Approach to Word Sense Disambiguation for Hindi Language
These days, the language is making hindrances in the advantages of Information Technology revolution in India. So, there is the need of the adequate measures to perform natural language processing (NLP) through computer processing so that computer based system can be interacted by users through natural language like Hindi. This paper presents a new Word Sense Disambiguation method associated wi...
متن کاملWord Sense Disambiguation in Hindi Language Using Hyperspace Analogue to Language and Fuzzy C-Means Clustering
The problem of Word Sense Disambiguation (WSD) can be defined as the task of assigning the most appropriate sense to the polysemous word within a given context. Many supervised, unsupervised and semi-supervised approaches have been devised to deal with this problem, particularly, for the English language. However, this is not the case for Hindi language, where not much work has been done. In th...
متن کاملAn Investigation to Semi supervised approach for HINDI Word sense disambiguation
This paper investigates yarowsky algorithm for Hindi word sense disambiguation. The evaluation has been developed o n a manually created sense tagged corpus consisting of Hindi words (nouns). The sense definition has been obtained from Hindi Word Net, which is developed at I I T B o m b a y . The maximum observed prec is ion o f 61.7 on 605 tes t ins tances corresponds to the case when both ste...
متن کاملWord Sense Disambiguation in Bengali applied to Bengali-Hindi Machine Translation
We have developed a word sense disambiguation(WSD) system for Bengali language and applied the system to get correct lexical choice in Bengali-Hindi machine translation. We are not aware of any existing system for Bengali WSD. Since there is no sense annotated Bengali corpus or sufficient amount of parallel corpus for Bengali-Hindi language pair, we had to use an unsupervised approach. We use a...
متن کاملUtilizing corpus statistics for hindi word sense disambiguation
Word Sense Disambiguation (WSD) is the task of computational assignment of correct sense of a polysemous word in a given context. This paper compares three WSD algorithms for Hindi WSD based on corpus statistics. The first algorithm, called corpus-based Lesk, uses sense definitions and a sense tagged training corpus to learn weights of Content Words (CWs). These weights are used in the disambig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004